Impact of time-history terms on reservoir dynamics and prediction accuracy in echo state networks

The echo state network (ESN) is an excellent machine learning model for processing time-series data. This model, utilising the response of a recurrent neural network, called a reservoir, to input signals, achieves high training efficiency. Introducing time-history terms into the neuron model of the reservoir is known to improve the time-series prediction performance of ESN, yet the reasons for this improvement have not been quantitatively explained in terms of reservoir dynamics characteristics. Therefore, we hypothesised that the performance enhancement brought about by time-history terms could be explained by delay capacity, a recently proposed metric for assessing the memory performance of reservoirs. To test this hypothesis, we conducted comparative experiments using ESN models with time-history terms, namely leaky integrator ESNs (LI-ESN) and chaotic echo state networks (ChESN). The results suggest that compared with ESNs without time-history terms, the reservoir dynamics of LI-ESN and ChESN can maintain diversity and stability while possessing higher delay capacity, leading to their superior performance. Explaining ESN performance through dynamical metrics are crucial for evaluating the numerous ESN architectures recently proposed from a general perspective and for the development of more sophisticated architectures, and this study contributes to such efforts.


Supplementary Note 1: Analysis of maximum Lyapunov exponent Maximum Lyapunov exponent
The maximum Lyapunov exponent λ serves as a metric for measuring orbital instability in dynamical systems [1].This metric is obtained through several equations, which can vary depending on the system under consideration.For a fully-leaky echo state network (ESN)/leaky integrator ESNs (LI-ESN), the reference orbit s k (t) is defined as (S1) In the case of chaotic echo state networks (ChESN), the system possesses three internal states, ξ(t), η(t), and ζ(t), and the reference orbit is given by For each initial condition, the perturbed orbit s ′ k evolves over a time period τ , with t l indicating the time evolution of the perturbation.It starts from an initial condition obtained by adding a perturbation δ to the reference orbit s k (t): The difference between the perturbed and reference orbits after the time evolution, denoted by δ , is calculated as For k > 1, this initial perturbation is computed using: Here, the first perturbation vector δ The Lyapunov exponent λ k for each initial condition is determined based on the growth of this perturbation:  Correspondence between time-series prediction performance and maximum Lyapunov exponents.This scatter plot illustrates the relationship between the normalised root mean square error (NRMSE) and memory capacity in time-series prediction tasks for both the Lorenz and Rössler systems.Each point on the scatter plot originates from grid search results, which include the optimal parameters as obtained in Fig. 5 in a main text.Grid search parameters are shown in Table 1 in a main text.
Finally, λ is computed by averaging these λ k values over all K different initial conditions: This overall process gives us a measure of the system's sensitivity to initial conditions by capturing the degree of development of the perturbation over time.

Scatter plot of the maximum Lyapunov exponent and performance
Figure.S1 displays the plot of the maximum Lyapunov exponent.This figure clearly shows that the highest performance occurs when the maximum Lyapunov exponent is near zero, indicating the edge of stability [2].Additionally, consistency typically achieves its maximum value when the Lyapunov exponent is zero or negative.Moreover, the covariance rank tends to be maximised when the maximum Lyapunov exponent is near zero or larger.Notably, at the edge of stability, where the reservoir's performance is optimised, both the covariance rank and consistency are observed to reach their maximum values.The maximum Lyapunov exponent is an indicator for evaluating the chaotic characteristic of dynamics, and a system with λ > 0 exhibits a chaotic state [1].From the perspective of reservoir computing, the maximum Lyapunov exponent has been used as an index to identify the reservoir dynamics where performance is optimised [2].The diversity of reservoir dynamics tends to be higher when the maximum Lyapunov exponent is high; however, performance degrades when the Lyapunov exponent exceeds a certain threshold, as the reservoir stability is compromised (i.e., the consistency between the input and output signals is not maintained).Therefore, optimal reservoir dynamics tend to be achieved at high maximum Lyapunov exponents, up to the point just before the stability of the reservoir is lost λ ≈ 0. This point is commonly referred to as the 'edge of chaos', but Carroll argues that the term 'edge of stability' is more appropriate in the context of reservoir computing, as the essential dynamic characteristic is not the system's chaotic nature, but rather the consistency between its input and output signals (echo state property) [2].
Furthermore, achieving optimal performance is not solely dependent on the diversity and stability of the reservoir.Carroll's experiments indicate that the edge of stability does not necessarily correspond to optimal performance, suggesting that there may be other dynamic characteristics that need to be simultaneously fulfilled to achieve optimal performance [2].According to this finding, in this study, we focus on memory, which is likely related to time-history terms among such performance-related dynamic characteristics.Moreover, it is not possible to gauge the level of stability and diversity of reservoir dynamics solely from the maximum Lyapunov exponent.In the main text, we introduced metrics for independently measuring dynamic diversity and stability, namely covariance rank [3] and consistency [4,5], as we need to compare the reservoir dynamics of fully-leaky ESN, LI-ESN, and ChESN.The inferior performance of simple ChESN compared with that of LI-ESN stems from the different application of time-history terms.Unlike LI-ESN, simple ChESN has decay coefficients in the neuron's internal state, making it more prone to values greater than 1 or less than −1, leading to saturation in the reservoir's firing state due to the tanh activation function.This saturation likely had a negative impact on performance.
Evaluation of the performance difference between simple ChESN and ChESN showed that in the Rössler task, ChESN performed better.This result seems obvious, given that simple ChESN adjusts only k, whereas ChESN adjusts both k f and k r .However, despite such differences in the number of adjustment parameters, simple ChESN outperformed ChESN in the Lorenz task.This could be attributed to the difference in the refractory scaling parameter (α = 0 in simple ChESN, and α = 0.9 in ChESN).The setting of α = 0.9 was adopted from our previous study, which achieved high performance in Mackey-Glass time-series prediction [6].The Mackey-Glass time series used, generated with a time delay term τ = 32, exhibited a slow time scale, similar to the Rössler task in this study.This suggests that a higher α may be necessary for slow time scale time-series data.

Fig. S1
Fig. S1Correspondence between time-series prediction performance and maximum Lyapunov exponents.This scatter plot illustrates the relationship between the normalised root mean square error (NRMSE) and memory capacity in time-series prediction tasks for both the Lorenz and Rössler systems.Each point on the scatter plot originates from grid search results, which include the optimal parameters as obtained in Fig.5in a main text.Grid search parameters are shown in Table1in a main text.

Fig
Fig.S2Time-series prediction performance at optimal parameters obtained through grid search.The performance metric is the normalised root mean square error (NRMSE) between the target output and the predictions.The main hyperparameters affecting the performance of the reservoir in each model, namely the spectral radius and input scaling, are set to values that minimise the average NRMSE across 10 trials with varying seed values.For LI-ESN, ChESN, and simple ChESN (simplified version of ChESN), the spectral radius and input scale are fixed at the optimal values, and the timehistory term that yielded the lowest NRMSE for each seed value is adopted (i.e., α l in LI-ESN, k f , kr in ChESN, and k in simple ChESN).For simplicity, some parameters of ChESN are fixed without grid search (ke = 0, α = 0.9, θ = 0).Error bars represent the standard deviation across the 10 trials.

Figure
Figure.S2, using the same method as Fig.3in the main text, shows the timeseries prediction performance of the reservoir.From this figure, it is evident that simple ChESN, while not matching, exhibits similar performance values to LI-ESN.Moreover, when comparing simple ChESN with ChESN, simple ChESN outperformed in the Lorenz task.The inferior performance of simple ChESN compared with that of LI-ESN stems from the different application of time-history terms.Unlike LI-ESN, simple ChESN has decay coefficients in the neuron's internal state, making it more prone to values greater than 1 or less than −1, leading to saturation in the reservoir's firing state due to the tanh activation function.This saturation likely had a negative impact on performance.Evaluation of the performance difference between simple ChESN and ChESN showed that in the Rössler task, ChESN performed better.This result seems obvious, given that simple ChESN adjusts only k, whereas ChESN adjusts both k f and